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PROTEIN-PROTEIN INTERACTIONS 
Between Shigella flexneri polypeptides And Mammalian Polypeptides 

PRIORITY 

[00 01] This application claims priority on the basis of United States Provisional 
Application No. 60/261,130, filed January 12, 2001, the contents of which are hereby 
incorporated by reference. 
BACKGROUND OF THE INVENTION 

[00 02] Most biological processes involve specific protein-protein interactions. Protein- 
protein interactions enable two or more proteins to associate. A large number of non- 
covalent bonds form between the proteins when two protein surfaces are precisely matched. 
These bonds account for the specificity of recognition. Thus, protein-protein interactions are 
involved, for example, in the assembly of enzyme subunits, in antibody-antigen recognition, 
in the formation of biochemical complexes, in the correct folding of proteins, in the 
metabolism of proteins, in the transport of proteins, in the localization of proteins, in protein 
turnover, in first translation modifications, in the core structures of viruses and in signal 
transduction. 

[0003] General methodologies to identify interacting proteins or to study these 
interactions have been developed. Among these methods are the two-hybrid system 
originally developed by Fields and co-workers and described, for example, in U.S. Patent 
Nos. 5,283,173, 5,468,614 and 5,667,973, which are hereby incorporated by reference. 
[0004] The earliest and simplest two-hybrid system, which acted as basis for 
development of other versions, is an in vivo assay between two specifically constructed 
proteins. The first protein, known in the art as the "bait protein" is a chimeric protein which 
binds to a site on DNA upstream of a reporter gene by means of a DNA-binding domain or 
BD. Commonly, the binding domain is the DNA-binding domain from either Gal4 or native E. 
coli LexA and the sites placed upstream of the reporter are Gal4 binding sites or LexA 
operators, respectively. 

[0005] The second protein is also a chimeric protein known as the "prey" in the art. This 
second chimeric protein carries an activation domain or AD. This activation domain is 
typically derived from Gal4, from VP16 or from B42. 

[0 00 6] Besides the two hybrid systems, other improved systems have been developed 
to detected protein-protein interactions. For example, a two-hybrid plus one system was 
developed that allows the use of two proteins as bait to screen available cDNA libraries to 
detect a third partner. This method permits the detection between proteins that are part of a 
larger protein complex such as the RNA polymerase II holoenzyme and the TFIIH or TFIID 
complexes. Therefore, this method, in general, permits the detection of ternary complex 
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formation as well as inhibitors preventing the interaction between the two previously defined 
fused proteins. 

[00 07] Another advantage of the two-hybrid plus one system is that it allows or prevents 
the formation of the transcriptional activator since the third partner can be expressed from a 
conditional promoter such as the methionine-repressed Met25 promoter which is positively 
regulated in medium lacking methionine. The presence of the methionine-regulated 
promoter provides an excellent control to evaluate the activation or inhibition properties of 
the third partner due to its "on" and "off" switch for the formation of the transcriptional 
activator. The three-hybrid method is described, for example in Tirode et al., The Journal of 
Biological Chemistry, 272, No. 37 pp. 22995-22999 (1997). incorporated herein by 
reference. 

[000 8] Besides the two and two-hybrid plus one systems, yet another variant is that 
described in Vidal et al, Proc. Natl. Sci. 93 pgs. 10315-10320 called the reverse two- and 
one-hybrid systems where a collection of molecules can be screened that inhibit a specific 
protein-protein or protein/DNA interactions, respectively. 

[0009] A summary of the available methodologies for detecting protein-protein 
interactions is described in Vidal and Legrain, Nucleic Acids Research Vol. 27, No. 4 
pgs.91 9-929 (1999) and Legrain and Selig, FEBS Letters 480 pgs. 32-36 (2000) which 
references are incorporated herein by reference. 

[0010] However, the above conventionally used approaches and especially the 
commonly used two-hybrid methods have their drawbacks. For example, it is known in the 
art that, more often than not, false positives and false negatives exist in the screening 
method. In fact, a doctrine has been developed in this field for interpreting the results and in 
common practice an additional technique such as co-immunoprecipitation or gradient 
sedimentation of the putative interactors from the appropriate cell or tissue type are 
generally performed. The methods used for interpreting the results are described by Brent 
and Finley, Jr. in Ann. Rev. Genet, 31 pgs. 663-704 (1997). Thus, the data interpretation is 
very questionable using the conventional systems. 

[0011] One method to overcome the difficulties encountered with the methods in the 
prior art is described in WO 99/42612, incorporated herein by reference. This method is 
similar to the two-hybrid system described in the prior art in that it also uses bait and prey 
polypeptides. However, the difference with this method is that a step of mating at least one 
first haploid recombinant yeast cell containing the prey polypeptide to be assayed with a 
second haploid recombinant yeast cell containing the bait polynucleotide is performed. Of 
course the person skilled in the art would appreciate that either the first recombinant yeast 
cell or the second recombinant yeast cell also contains at least one detectable reporter gene 
that is activated by a polypeptide including a transcriptional activation domain. 



[0012] The method described in WO 99/42612 permits the screening of more prey 
polynucleotides with a given bait polynucleotide in a single step than in the prior art systems 
due to the cell to cell mating strategy between haploid yeast cells. Furthermore, this method 
is more thorough and reproducible, as well as sensitive. Thus, the presence of false 
negatives and/or false positives is extremely minimal as compared to the conventional prior 
art methods. 

[0013] The genus Shigella includes four species (major serogroups): S. dysentehae 
(Grp. A), S. flexneri (Grp. B), S. boydii (Grp. C) and S. sonnei (Grp. D) as classified in 
Bergey's Manual for Systematic Bacteriology (N. R. Krieg, ed., pp. 423-427 (1984)). The 
genera Shigella and Escherichia are phylogenetically closely related. Brenner and others 
M have suggested that the two are more correctly considered sibling species based on 

q DNA/DNA reassociation studies (D. J. Brenner et al., International J. Systematic 

HF Bacteriology, 23:1-7 (1973)). These studies showed that Shigella species are on average 

5 80-89% related to E. coli at the DNA level. Also, the degree of relatedness between Shigella 

03 species is on average 80-89%. 

%i 

s [0014] The genus Shigella is pathogenic in humans; it causes bacillary dysentery at 

Q levels of infection of 10 to 100 organisms. 

L [0015] Shigellosis or bacillary dysentery is a disease that is endemic throughout the 

world. The disease presents a particularly serious public health problem in tropical regions 

«jj and developing countries where Shigella dysenteriae and S. flexneri predominate. In 

industrialized countries, the principal etiologic agent is S. sonnei although sporadic cases of 
shigellosis are encountered due to S. flexneri, S. boydii and certain entero-invasive 
Escherichia coli. 

[0016] The primary step in the pathogenesis of bacillary dysentery is invasion of the 
human colonic mucosa by Shigella (Labrec, E. H., H. Schneider, T. J. Magnani, and S. B. 
Formal. 1964. Epithelial cell penetration as an essential step in the pathogenesis of bacillary 
dysentery. J. Bacteriol. 88:1503). Mucosal invasion encompasses several steps which 
include penetration of the bacteria into epithelial cells, intracellular multiplication, killing of 
host cells, and final spreading to adjacent cells and to connective tissue (Formal, S. B., T. L. 
Hale, and P. J. Sansonetti. 1983. Invasive enteric pathogens. Rev. Infect. Dis. 5:S702, Rout, 
W. R., S. B. Formal, R. A. Giannella, and G. J. Dammin. 1975. The pathophysiology of 
Shigella diarrhea in the Rhesus monkey; intestinal transport, morphology and bacteriological 
studies. Gastroenterology 68:270, Takeuchi, A., H. Spring, E. H. LaBrec, and S. B. Formal. 
1965. Experimental acute colitis in the Rhesus monkey following peroral infection with 
Shigella flexneri. Am. J. Pathol. 52:503, Takeuchi, A. 1967. Electron microscope studies of 
experimental Salmonella infection. I. Penetration into cells of the intestinal epithelium by 
Salmonella typhimurium. Am. J. Pathol. 47:1011). The overall process which is usually 
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limited to the mucosal surface leads to a strong inflammatory reaction which is responsible 
for abscesses and ulcerations (Labrec, E. H., H. Schneider, T. J. Magnani, and S. B. Formal. 
1964. Epithelial cell penetration as an essential step in the pathogenesis of bacillary 
dysentery. J. Bacteriol. 88:1503., Rout, W. R., S. B. Formal, R. A. Giannella, and G. J. 
Dammin. 1975. The pathophysiology of Shigella diarrhea in the Rhesus monkey; intestinal 
transport, morphology and bacteriological studies. Gastroenterology 68:270, Takeuchi, A., H. 
Spring, E. H. LaBrec, and S. B. Formal. 1965. Experimental acute colitis in the Rhesus 
monkey following peroral infection with Shigella flexneri. Am. J. Pathol. 52:503). 
[0017] Even though dysentery is characteristic of shigellosis, it may be preceded by 
watery diarrhea. Diarrhea appears to be the result of disturbances in colonic reabsorption 
and increased jejunal secretion whereas dysentery is a purely colonic process (Kinsey, M. 
D., S. B. Formal, G. J. Dammin, and R. A. Giannella. 1976. Fluid and electrolyte transport in 
Rhesus monkeys challenged intraceacally with Shigella flexneri 2a. Infect. Immun. 14:368). 
These include toxic megacolon, leukemoid reactions and hemolytic-uremic syndrome 
("HUS"). The latter is a major cause of mortality from shigellosis in developing areas 
(Gianantonio, C., H. Vitacco, F. Mendilaharzu, A. Rutty, and J. Mendilaharzu. 1964. The 
hemolytic-uremic syndrome. J. Pediatr. 64:478, Koster, F., J. Levin, L. Walker, K. S. K. 
Tung, R. H. Gilman, M. M. Rajaman, M. A. Majid, S. Islam, and R. C. Williams Jr. 1977. 
Hemolyticuremic syndrome after shigellosis. Relation to endotoxin and circulating immune 
complexes. N. Engl. J. Med. 298:927). 

[0018] The role of Shiga-toxin produced at high level by S. dysenteriae 1 (Conradi, H., 
1903. Ueber loshlishe, durch aseptische Autolyse, erhaltene Giftstoffe von Ruhr-un Typhus 
bazillen. Dtsch. Med. Wochenschr. 29:26) and Shiga-like toxins ("SLT") produced at low 
level by S. flexneri and S. sonnei (Keusch, G. T., and M. Jacewicz. 1977. The pathogenesis 
of Shigella diarrhea. VI. Toxin and antitoxin in Shigella flexneri and Shigella sonnei infections 
in humans. J. Infect. Dis. 135:552) in the four major stages of shigellosis (i.e., invasion of 
individual epithelial cells, tissue invasion, diarrhea and systemic symptoms) is not well 
understood. For review see O'Brien and Holmes (O'Brien, A. D., and R. K. Holmes. 1987. 
Shiga and Shiga-like toxins. Microbiol. Rev. 51:206). Plasmids of 180-220 kilobases ("kb") 
are essential in all Shigella species for invasion of individual epithelial cells (Rout, W. R., S. 
B. Formal, R. A. Giannella, and G. J. Dammin. 1975. The pathophysiology of Shigella 
diarrhea in the Rhesus monkey; intestinal transport, morphology and bacteriological studies. 
Gastroenterology 68:270, Sansonetti, P. J., D. J. Kopecko, and S. B. Formal. 1981. Shigella 
sonnei plasmids: evidence that a large plasmid is neceessary for virulence. Infect. Immun. 
34:75, Sansonetti, P. J„ T. L. Hale, G. I. Dammin, C. Kapper, H. H. Collins Jr., and S. B. 
Formal. 1983. Alterations in the pathogenesis of Escherichia coli K12 after transfer of 
plasmids and chromosomal genes from Shigella flexneri. Infect. Immun. 39:1392). This 



includes entry, intracellular multiplication and early killing of host cells (Clerc, P., A. Ryter, J. 
Mounier, and P. J. Sansonetti. 1987. Plasmid-mediated early killing of eucaryotic cells by 
Shigella flexneri as studied by infection of J774 macrophages. Infect. Immun. 55:521, Clerc, 
P., and P. J. Sansonetti. 1987. Entry of Shigella flexneri into HeLa cells: Evidence for 
directed phagocytosis involving actin polymerization and myosin accumulation. Infect. 
Immun. 55:2681). The role of Shiga-toxin and SLT at this stage is unclear. 
[0019] Recent evidence indicates that Shiga-toxin is cytotoxic for primary cultures of 
human colonic cells (Moyer, M. P., P. S. Dixon, S. W. Rothman, and J. E. Brown. 1987. 
Cytotoxicity of Shiga toxin for human colonic and ileal epithelial cells. Infect. Immun. 
55:1533). Tissue invasion requires additional chromosomally encoded products among 
which are smooth lipopolysaccharides ("LPS") (Sansonetti, P. J., T. L. Hale, G. I. Dammin, 
C. Kapper, H. H. Collins Jr., and S. B. Formal. 1983. Alterations in the pathogenesis of 
Escherichia coli K12 after transfer of plasmids and chromosomal genes from Shigella 
W flexneri. Infect. Immun. 39:1392), the non-characterized product of the Kcp locus, and 

aerobactin. A region of the S. flexneri chromosome necessary for fluid production in rabbit 
ileal loops has been localized to the rha-mt1 regions and near the lysine decarboxylase 
locus (Sansonetti, P. J., T. L. Hale, G. I. Dammin, C. Kapper, H. H. Collins Jr., and S. B. 
Formal. 1983. Alterations in the pathogenesis of Escherichia coli K12 after transfer of 
plasmids and chromosomal genes from Shigella flexneri. Infect. Immun. 39:1392). However, 
O no evidence has been adduced to show that the ability to cause fluid accumulation is due to 

the SLT of S. flexneri. Thus, the role of Shiga-toxin in causing the systemic complications of 
shigellosis is still hypothetical. However, Shiga-toxin can mediate vascular damage since 
capillary lesions observed in HUS resemble those observed in cerebral vessels of animals 
injected with this toxin (Bridgewater, F. A. I., R. S. Morgan, K. E. K. Rowson, and G. P. 
Wright. 1955. the neurotoxin of Shigella shigae. Morphological and functional lesions 
produced in the central nervous system of rabbits. Br. J. Exp. Pathol. 36: 447, Cavanagh, J. 
B., J. G. Howard, and J. L. Whitby. 1956. The neurotoxin of Shigella shigae. A comparative 
study of the effects produced in various laboratory animals. Br. J. Exp. Med. 37:272). 
[0020] As described before, the genera of Shigella and Escherichia are phylogenetically 
closely related. Furthermore, the pathogenesis of enteroinvasive E. coli is very similar to 
that of Shigella. In both, dysentery results from invasion of the colonic epithelial cells 
followed by intracellular multiplication which leads to bloody, mucous discharge with scanty 
diarrhea. 

[0021] Pathogenic E. coli serotypes are collectively referred to as Enterovirulent E. coli 
(EVEC) (J. R. Lupski, et al., J. Infectious Diseases, 157:1120-1123 (1988); M. M. Levine, J. 
Infectious Diseases, 155:377-389 (1987); M. A. Karmali, Clinical Microbiology Reviews, 
2:15-38 (1989)). This group includes at least 5 subclasses of E. coli, each having a 
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characteristic pathogenesis pathway resulting in diarrheal disease. The subclasses include 
Enterotoxigenic E. coli (ETEC), Verotoxin-Producing E. coli (VTEC), Enteropathogenic E. 
coli (EPEC), Enteroadherent E. coli (EAEC) and Enteroinvasive E. coli (EIEC). The VTEC 
include Enterohemorrhagic E. coli (EHEC) since these produce verotoxins. 
[0022] Thus, detection of Shigella and EIEC is important in various medical contexts. 
For example, the presence of either Shigella or EIEC in stool samples is indicative of 
gastroenteritis, and the ability to screen for their presence is useful in treating and controlling 
that disease. Detection of Shigella or EIEC in any possible transmission vehicle such as food 
is also important to avoid spread of gastroenteritis. 

[0023] That is why there is a great need to construct Protein Interaction Map between 
I* Shigella polypeptides and human polypeptides in order to understand mechanisms of 

2 Shigella pathogenesis and to identify drug target to treat Shigella associated diseases and 

JS Shigella detection means. 

W SUMMARY OF THE PRESENT INVENTION 

£ [0024] Thus, it is an object of the present invention to identify protein-protein interactions 

^ between Shigella polypeptides and mammalian, preferably human, polypeptides, 

a [0 02 5] It is another object of the present invention to identify protein-protein interactions 

between Shigella polypeptides and mammalian, preferably human, polypeptides for the 
development of more effective and better targeted therapeutic applications. 
[0026] It is yet another object of the present invention to identify complexes of 
polypeptides or polynucleotides encoding the polypeptides and fragments of the 
polypeptides of Shigella genus and polypeptides and fragments of the polypeptides of 
mammals, preferably human. 

[0027] It is yet another object of the present invention to identify antibodies to these 
complexes of polypeptides or polynucleotides encoding the polypeptides and fragments of 
the polypeptides of Shigella genus and mammals, preferably human, including polyclonal, as 
well as monoclonal antibodies that are used for detection. 

[0028] It is still another object of the present invention to identify selected interacting 
domains of the polypeptides, called SID® polypeptides. 

[002 9] It is still another object of the present invention to identify selected interacting 
domains of the polynucleotides, called SID® polynucleotides. 

[0030] It is another object of the present invention to generate protein-protein 
interactions maps called PIM®s. 

[0031] It is yet another object of the present invention to provide a method for screening 
drugs for agents which modulate the interaction of proteins and pharmaceutical 
compositions that are capable of modulating the protein-protein interactions between 
Shigella polypeptides and mammalian, preferably human, polypeptides. 
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[0 032] It is another object to administer the nucleic acids of the present invention via 
gene therapy. 

[0033] It is yet another object of the present invention to provide protein chips or protein 
microarrays. 

[0034] It is yet another object of he present invention to provide a report in, for example 
paper, electronic and/or digital forms, concerning the protein-protein interactions, the 
modulating compounds and the like as well as a PIM®. 

[0 03 5] Thus the present invention, in one aspect thereof, relates to a protein complex 
between a Shigella polypeptide and a mammalian polypeptide. In another embodiment, the 
Shigella and the mammalian polypeptides are polypeptides set forth on columns 1 and 3 
M- respectively of Table II. 

j^jj [0 03 6] Furthermore, the present invention provides SID® polynucleotides and SID® 

polypeptides of Table III, as well as a PIM® between Shigella polypeptides and mammalian, 
preferably human, polypeptides. 

[0037] The present invention also provides antibodies to the protein-protein complexes 
between Shigella polypeptides and mammal, preferably human, polypeptides. 
[003 8] In another embodiment the present invention provides a method for screening 
drugs for agents that modulate the protein-protein interactions and pharmaceutical 
compositions that are capable of modulating protein-protein interactions, 
j^j [003 9] In another embodiment the present invention provides protein chips or protein 

microarrays. 

[0040] In yet another embodiment the present invention provides a report in, for 

example, paper, electronic and/or digital forms. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0041] Fig. 1 is a schematic representation of the pB1 plasmid. 

[0042] Fig. 2 is a schematic representation of the pB5 plasmid. 

[0043] Fig. 3 is a schematic representation of the pB6 plasmid. 

[0044] Fig. 4 is a schematic representation of the pB13 plasmid. 

[0045] Fig. 5 is a schematic representation of the pB14 plasmid. 

[0046] Fig. 6 is a schematic representation of the pB20 plasmid. 

[0047] Fig. 7 is a schematic representation of the pP1 plasmid. 

[0048] Fig. 8 is a schematic representation of the pP2 plasmid. 

[0049] Fig. 9 is a schematic representation of the pP3 plasmid. 

[0050] Fig. 10 is a schematic representation of the pP6 plasmid. 

[0051] Fig. 1 1 is a schematic representation of the pP7 plasmid. 

[0052] Fig. 12 is a schematic representation of vectors expressing the T25 fragment. 

[0053] Fig. 13 is a schematic representation of vectors expressing the T1 8 fragment. 
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[0054] Fig. 14 is a schematic representation of various vectors of pCmAHLI, pT25 and 
pT18. 

[0055] Fig. 15 is a schematic representation of identification of SID®. In this figure the 
"Full-length prey protein" is the Open Reading Frame (ORF) or coding sequence (CDS) 
where the identified prey polypeptides are included. The Selected Interaction Domain 
(SID®) is determined by the commonly shared polypeptide domain of every selected prey 
fragment. 

[0056] Fig. 16 is a protein map (PIM®). 
DETAILED DESCRIPTION OF THE INVENTION 

[0057] As used herein the terms "polynucleotides", "nucleic acids" and "oligonucleotides" 
are used interchangeably and include, but are not limited to RNA, DNA, RNA/DNA 
sequences of more than one nucleotide in either single chain or duplex form. The 
polynucleotide sequences of the present invention may be prepared from any known method 
including, but not limited to, any synthetic method, any recombinant method, any ex vivo 
generation method and the like, as well as combinations thereof. 

[0058] The term "polypeptide" means herein a polymer of amino acids having no specific 
length. Thus, peptides, oligopeptides and proteins are included in the definition of 
"polypeptide" and these terms are used interchangeably throughout the specification, as well 
as in the claims. The term "polypeptide" does not exclude post-translational modifications 
such as polypeptides having covalent attachment of glycosyl groups, aceteyl groups, 
phosphate groups, lipid groups and the like. Also encompassed by this definition of 
"polypeptide" are homologs thereof. 

[0059] By the term "homologs" is meant structurally similar genes contained within a 
given species, orthologs are functionally equivalent genes from a given species or strain, as 
determined for example, in a standard complementation assay. Thus, a polypeptide of 
interest can be used not only as a model for identifying similiar genes in given strains, but 
also to identify homologs and orthologs of the polypeptide of interest in other species. The 
orthologs, for example, can also be identified in a conventional complementation assay. In 
addition or alternatively, such orthologs can be expected to exist in bacteria (or other kind of 
cells) in the same branch of the phylogenic tree, as set forth, for example, at 
ftp://ftp.cme.msu.edu/pub/rdp/SSU-rRNA/SSU/Prok.phylo . 

[0060] As used herein the term "prey polynucleotide" means a chimeric polynucleotide 
encoding a polypeptide comprising (i) a specific domain; and (ii) a polypeptide that is to be 
tested for interaction with a bait polypeptide. The specific domain is preferably a 
transcriptional activating domain. 

[0061] As used herein, a "bait polynucleotide" is a chimeric polynucleotide encoding a 
chimeric polypeptide comprising (i) a complementary domain; and (ii) a polypeptide that is to 
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be tested for interaction with at least one prey polypeptide. The complementary domain is 
preferably a DNA-binding domain that recognizes a binding site that is further detected and 
is contained in the host organism. 

[00 62] As used herein "complementary domain" is meant a functional constitution of the 
activity when bait and prey are interacting; for example, enzymatic activity. 

[0063] As used herein "specific domain" is meant a functional interacting activation 
domain that may work through different mechanisms by interacting directly or indirectly 
through intermediary proteins with RNA polymerase II or Ill-associated proteins in the vicinity 
of the transcription start site. 

[0064] As used herein the term "complementary" means that, for example, each base of 
M a first polynucleotide is paired with the complementary base of a second polynucleotide 

p whose orientation is reversed. The complementary bases are A and T (or A and U) or C and 

-P G. 

J* [0065] The term "sequence identity" refers to the identity between two peptides or 

between two nucleic acids. Identity between sequences can be determined by comparing a 
position in each of the sequences which may be aligned for the purposes of comparison. 
When a position in the compared sequences is occupied by the same base or amino acid, 
fH then the sequences are identical at that position. A degree of sequence identity between 

M nucleic acid sequences is a function of the number of identical nucleotides at positions 

shared by these sequences. A degree of identity between amino acid sequences is a 
function of the number of identical amino acid sequences that are shared between these 
sequences. Since two polypeptides may each (i) comprise a sequence (i.e., a portion of a 
complete polynucleotide sequence) that is similar between two polynucleotides, and (ii) may 
further comprise a sequence that is divergent between two polynucleotides, sequence 
identity comparisons between two or more polynucleotides over a "comparison window" 
refers to the conceptual segment of at least 20 contiguous nucleotide positions wherein a 
polynucleotide sequence may be compared to a reference nucleotide sequence of at least 
20 contiguous nucleotides and wherein the portion of the polynucleotide sequence in the 
comparison window may comprise additions or deletions (i.e., gaps) of 20 percent or less 
compared to the reference sequence (which does not comprise additions or deletions) for 
optimal alignment of the two sequences. 

[0066] To determine the percent identity of two amino acids sequences or two nucleic 
acid sequences, the sequences are aligned for optimal comparison. For example, gaps can 
be introduced in the sequence of a first amino acid sequence or a first nucleic acid sequence 
for optimal alignment with the second amino acid sequence or second nucleic acid 
sequence. The amino acid residues or nucleotides at corresponding amino acid positions or 
nucleotide positions are then compared. When a position in the first sequence is occupied 
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by the same amino acid residue or nucleotide as the corresponding position in the second 
sequence, the molecules are identical at that position. 

[0067] The percent identity between the two sequences is a function of the number of 
identical positions shared by the sequences. Hence % identity = number of identical 
positions / total number of overlapping positions X 100. 

[00 68] In this comparison the sequences can be the same length or may be different in 
length. Optimal alignment of sequences for determining a comparison window may be 
conducted by the local homology algorithm of Smith and Waterman (J. Theor. Biol., 91 (2) 
pgs. 370-380 (1981), by the homology alignment algorithm of Needleman and Wunsch, J. 
Miol. Biol., 48(3) pgs. 443-453 (1972), by the search for similarity via the method of Pearson 
and Lipman, PNAS, USA, 85(5) pgs. 2444-2448 (1988) , by computerized implementations 
of these algorithms (GAP, BESTFIT, FASTA and T FAST A in the Wisconsin Genetics 
Software Package Release 7.0, Genetic Computer Group, 575, Science Drive, Madison, 
Wisconsin) or by inspection. 

[0069] The best alignment (i.e., resulting in the highest percentage of identity over the 
comparison window) generated by the various methods is selected. 

[0070] The term "sequence identity" means that two polynucleotide sequences are 
identical (i.e., on a nucleotide by nucleotide basis) over the window of comparison. The term 
"percentage of sequence identity" is calculated by comparing two optimally aligned 
sequences over the window of comparison, determining the number of positions at which the 
identical nucleic acid base (e.g., A, T, C, G, U, or I) occurs in both sequences to yield the 
number of matched positions, dividing the number of matched positions by the total number 
of positions in the window of comparison (i.e., the window size) and multiplying the result by 
100 to yield the percentage of sequence identity. The same process can be applied to 
polypeptide sequences. . _ 

[0071] The percentage of sequence identity of a nucleic acid sequence or an amino acid 
sequence can also be calculated using BLAST software (Version 2.06 of September 1998) 
with the default or user defined parameter. 

[0072] The term "sequence similarity" means that amino acids can be modified while 
retaining the same function. It is known that amino acids are classified according to the 
nature of their side groups and some amino acids such as the basic amino acids can be 
interchanged for one another while their basic function is maintained. 

[0073] The term "isolated" as used herein means that a biological material such as a 
nucleic acid or protein has been removed from its original environment in which it is naturally 
present. For example, a polynucleotide present in a plant, mammal or animal is present in 
its natural state and is not considered to be isolated. The same polynucleotide separated 
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from the adjacent nucleic acid sequences in which it is naturally inserted in the genome of 
the plant or animal is considered as being "isolated." 

[0074] The term "isolated" is not meant to exclude artificial or synthetic mixtures with 
other compounds, or the presence of impurities which do not interfere with the biological 
activity and which may be present, for example, due to incomplete purification, addition of 
stabilizers or mixtures with pharmaceutical^ acceptable excipients and the like. 
[0075] "Isolated polypeptide" or "isolated protein" as used herein means a polypeptide or 
protein which is substantially free of those compounds that are normally associated with the 
polypeptide or protein in a naturally state such as other proteins or polypeptides, nucleic 
acids, carbohydrates, lipids and the like. 
M= [0076] The term "purified" as used herein means at least one order of magnitude of 

■~ purification is achieved, preferably two or three orders of magnitude, most preferably four or 

=C five orders of magnitude of purification of the starting material or of the natural material. 

\ 3 1 

% Thus, the term "purified" as utilized herein does not mean that the material is 100% purified 

63 and thus excludes any other material. 

[0077] The term "variants" when referring to, for example, polynucleotides encoding a 
O polypeptide variant of a given reference polypeptide are polynucleotides that differ from the 

j, reference polypeptide but generally maintain their functional characteristics of the reference 

\* polypeptide. A variant of a polynucleotide may be a naturally occurring allelic variant or it 

Syj may be a variant that is known naturally not to occur. Such non-naturally occurring variants 

of the reference polynucleotide can be made by, for example, mutagenesis techniques, 

including those mutagenesis techniques that are applied to polynucleotides, cells or 

organisms. 

[0078] Generally, differences are limited so that the nucleotide sequences of the 
reference and variant are closely similar overall and, in many regions identical. 

[0079] Variants of polynucleotides according to the present invention include, but are not 
limited to, nucleotide sequences which are at least 95% identical after alignment to the 
reference polynucleotide encoding the reference polypeptide. These variants can also have 
96%, 97%, 98% and 99.999% sequence identity to the reference polynucleotide. 

[0080] Nucleotide changes present in a variant polynucleotide may be silent, which 
means that these changes do not alter the amino acid sequences encoded by the reference 
polynucleotide. 

[0081] Substitutions, additions and/or deletions can involve one or more nucleic acids. 
Alterations can produce conservative or non-conservative amino acid substitutions, deletions 
and/or additions. 

[0082] Variants of a prey or a SID® polypeptide encoded by a variant polynucleotide can 
possess a higher affinity of binding and/or a higher specificity of binding to its protein or 
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polypeptide counterpart, against which it has been initially selected. In another context, 
variants can also loose their ability to bind to their protein or polypeptide counterpart. 
[0083] By "anabolic pathway" is meant a reaction or series of reactions in a metabolic 
pathway that synthesize complex molecules from simpler ones, usually requiring the input of 
energy. An anabolic pathway is the opposite of a catabolic pathway. 

[0084] As used herein, a "catabolic pathway" is a series of reactions in a metabolic 
pathway that break down complex compounds into simpler ones, usually releasing energy in 
the process. A catabolic pathway is the opposite of an anabolic pathway. 
[00 85] As used herein, "drug metabolism" is meant the study of how drugs are 
processed and broken down by the body. Drug metabolism can involve the study of 
1*4 enzymes that break down drugs, the study of how different drugs interact within the body 

and how diet and other ingested compounds affect the way the body processes drugs. 
[0086] As used herein, "metabolism" means the sum of all of the enzyme-catalyzed 
reactions in living cells that transform organic molecules, 
pg [0087] By "secondary metabolism" is meant pathways producing specialized metabolic 

y products that are not found in every cell. 

O [008 8] As used herein, "SID®" means a Selected Interacting Domain and is identified as 

follows: for each bait polypeptide screened, selected prey polypeptides are compared. 
M Overlapping fragments in the same ORF or CDS define the selected interacting domain, 

jri [008 9] As used herein the term "PIM®" means a protein-protein interaction map. This 

map is obtained from data acquired from a number of separate screens using different bait 
polypeptides and is designed to map out all of the interactions between the polypeptides. 
[0090] The term "affinity of binding", as used herein, can be defined as the affinity 
constant Ka when a given SID® polypeptide of the present invention which binds to a 
polypeptide and is the following mathematical relationship: 
[00 91] [SID®/polypeptide complex] 

[0092] Ka = 

[0093] [free SID®] [free polypeptide] 

[0094] wherein [free SID®], [free polypeptide] and [SID®/polypeptide complex] consist 
of the concentrations at equilibrium respectively of the free SID® polypeptide, of the free 
polypeptide onto which the SID® polypeptide binds and of the complex formed between 
SID® polypeptide and the polypeptide onto which said SID® polypeptide specifically binds. 

[0095] The affinity of a SID® polypeptide of the present invention or a variant thereof for 
its polypeptide counterpart can be assessed, for example, on a Biacore™ apparatus 
marketed by Amersham Pharmacia Biotech Company such as described by Szabo et al Curr 
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Opin Struct Bio\ 5 pgs. 699-705 (1995) and by Edwards and Leartherbarrow, Anal. Biochem 
246 pgs. 1-6 (1997). 

[0 096] As used herein the phrase "at least the same affinity" with respect to the binding 
affinity between a SID® polypeptide of the present invention to another polypeptide means 
that the Ka is identical or can be at least two-fold, at least three-fold or at least five fold 
greater than the Ka value of reference. 

[0 0 97] As used herein, the term "modulating compound" means a compound that 
inhibits or stimulates or can act on another protein which can inhibit or stimulate the protein- 
protein interaction of a complex of two polypeptides or the protein-protein interaction of two 
polypeptides. 

[0 098] More specifically, the present invention comprises complexes of polypeptides or 
polynucleotides encoding the polypeptides composed of a bait polypeptide, or a bait 
polynucleotide encoding a bait polypeptide and a prey polypeptide or a prey polynucleotide 
encoding a prey polypeptide. The prey polypeptide or prey polynucleotide encoding the prey 
polypeptide is capable of interacting with a bait polypeptide of interest in various hybrid 
systems. 

P [009 9] As described in the Background of the present invention there are various 

M methods known in the art to identify prey polypeptides that interact with bait polypeptides of 

£J interest. These methods, include, but are not limited to, generic two-hybrid systems as 

ny described by Fields et al in Nature, 340:245-246 (1989) and more specifically in U.S. Patent 

Nos. 5,283,173, 5,468,614 and 5,667,973, which are hereby incorporated by reference; the 
reverse two-hybrid system described by Vidal et al, supra; the two plus one hybrid method 
described, for example, in Tirode et al, supra; the yeast forward and reverse 'n'-hybrid 
systems as described in Vidal and Legrain, supra; the method described in WO 99/42612; 
those methods described in Legrain et al FEBS Letters 480 pgs. 32-36 (2000) and the like. 
[0100] The present invention is not limited to the type of method utilized to detect 
protein-protein interactions and therefore any method known in the art and variants thereof 
can be used. It is however better to use the method described in WO 99/42612 or WO 
00/66722, both references incorporated herein by reference due to the methods' sensitivity, 
reproducibility and reliability. 

[0101] Protein-protein interactions can also be detected using complementation assays 
such as those described by Pelletier et al. at 
http://www.abrf.org/JBT/ Articles/JBTOQ] 2/ibtQQI 2.html . WO 00/07038 and WO98/34120. 

[0102] Although the above methods are described for applications in the yeast system, 
the present invention is not limited to detecting protein-protein interactions using yeast, but 
also includes similar methods that can be used in detecting protein-protein interactions in, for 
example, mammalian systems as described, for example in Takacs et al., Proc. Natl. Acad. 
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ScL, USA, 90 (21):10375-79 (1993) and Vasavada et al., Proc. Natl. Acad. Sci., USA, 88 
(23): 10686-90 (1991), as well as a bacterial two-hybrid system as described in Karimova et 
al (1998), W099/28746, WO 00/66722 and Legrain et al FEBS Letters, 480 pgs. 32-36 
(2000). 

[0103] The above-described methods are limited to the use of yeast, mammalian cells 
and Escherichia coli cells, the present invention is not limited in this manner. Consequently, 
mammalian and typically human cells, as well as bacterial, yeast, fungus, insect, nematode 
and plant cells are encompassed by the present invention and may be transfected by the 
nucleic acid or recombinant vector as defined herein. 

[0104] Examples of suitable cells include, but are not limited to, VERO cells, HELA cells 
such as ATCC No. CCL2, CHO cell lines such as ATCC No. CCL61, COS cells such as 
COS-7 cells and ATCC No. CRL 1650 cells, W138, BHK, HepG2, 3T3 such as ATCC No. 
Jp CRL6361, A549, PC12, K562 cells, 293 cells, Sf9 cells such as ATCC No. CRL1711 and 
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Cv1 cells such as ATCC No. CCL70. 

[0105] Other suitable cells that can be used in the present invention include, but are not 
limited to, prokaryotic host cells strains such as Escherichia coli t (e.g., strain DH5-a), 
Bacillus subtilis, Salmonella typhimurium, or strains of the genera of Pseudomonas, 
Streptomyces and Staphylococcus. 

[0106] Further suitable cells that can be used in the present invention include yeast cells 
n i such as those of Saccharomyces such as Saccharomyces cerevisiae. 

[0107] The bait polynucleotide, as well as the prey polynucleotide can be prepared 
according to the methods known in the art such as those described above in the publications 
and patents reciting the known method perse. 

[0108] The bait polynucleotide of the present invention is obtained from Shigella flexneri 
(see Table I). The prey polynucleotide is obtained form a human placenta cDNA or variants 
thereof and fragments from the genome or transcriptome of human placenta ranging from 
about 12 to about 5,000, or about 12 to about 10,000 or from about 12 to about 20,000. The 
prey polynucleotide is then selected, sequenced and identified. 

[0109] A human placenta cDNA prey library is prepared from global human placenta and 
constructed in the specially designed prey vector pP6 as shown in Figure 10 after ligation of 
suitable linkers such that every cDNA fragment insert is fused to a nucleotide sequence in 
the vector that encodes the transcription activation domain of a reporter gene. Any 
transcription activation domain can be used in the present invention. Examples include, but 
are not limited to, Gal4,YP16, B42, His and the like. Toxic reporter genes, such as CAT R , 
CYH2, CYH1, URA3, bacterial and fungi toxins and the like can be used in reverse two- 
hybrid systems. 
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[0110] The polypeptides encoded by the nucleotide inserts of the human placenta cDNA 
prey library thus prepared are termed "prey polypeptides" in the context of the presently 
described selection method of the prey polynucleotides. 

[0111] The bait polynucleotide can be inserted in bait plasmid pB6 or pB20 as illustrated 
in Figure 3 or 6 respectively. The bait polynucleotide insert is fused to a polynucleotide 
encoding the binding domain of, for example, the Gal4 DNA binding domain and the shuttle 
expression vector is used to transform cells. The bait polynucleotides used in the present 
invention are describes in Table I. As stated above, any cells can be utilized in transforming 
the bait and prey polynucleotides of the present invention including mammalian cells, 
bacterial cells, yeast cells, insect cells and the like. 

M [0112] In an embodiment, the present invention identifies protein-protein interactions in 

O 

g yeast. In using known methods a prey positive clone is identified containing a vector which 

45 comprises a nucleic acid insert encoding a prey polypeptide which binds to a bait 

ill ~ 

polypeptide of interest. The method in which protein-protein interactions are identified 

00 comprises the following steps: 

[0113] mating at least one first haploid recombinant yeast cell clone from a recombinant 

yeast cell clone library that has been transformed with a plasmid containing the prey 

polynucleotide to be assayed with a second haploid recombinant yeast cell clone 

transformed with a plasmid containing a bait polynucleotide encoding for the bait 

polypeptide; 

[0114] cultivating diploid cell clones obtained in step i) on a selective medium; and 
[0115] selecting recombinant cell clones which grow on the selective medium. 
[0116] This method may further comprise the step of: 

[0117] iv) characterizing the prey polynucleotide contained in each recombinant cell 
clone which is selected in step iii). 

[0118] In yet another embodiment of the present invention, in lieu of yeast, Escherichia 
coli is used in a bacterial two-hybrid system, which encompasses a similar principle to that 
described above for yeast, but does not involve mating for characterizing the prey 
polynucleotide. 

[0119] In yet another embodiment of the present invention, mammalian cells and a 
method similar to that described above for yeast for characterizing the prey polynucleotide 
are used. 

[0120] By performing the yeast, bacterial or mammalian two-hybrid system it is possible 
to identify for one particular bait an interacting prey polypeptide. The prey polypeptide that 
has been selected by testing the library of preys in a screen using the two-hybrid, two plus 
one hybrid methods and the like, encodes the polypeptide interacting with the protein of 
interest. 
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[0121] The present invention is also directed, in a general aspect, to a complex of 
polypeptides, polynucleotides encoding the polypeptides composed of a bait polypeptide or 
bait polynucleotide encoding the bait polypeptide and a prey polypeptide or prey 
polynucleotide encoding the prey polypeptide capable of interacting with the bait polypeptide 
of interest. These complexes are identified in Table II, as the bait amino acid sequences 
and the prey amino acid sequences, as well as the bait and prey nucleic acid sequences. 
[0122] In another aspect, the present invention relates to a complex of polynucleotides 
consisting of a first polynucleotide, or a fragment thereof, encoding a prey polypeptide that 
interacts with a bait polypeptide and a second polynucleotide or a fragment thereof. This 
fragment has at least 12 consecutive nucleotides, but can have between 12 and 5,000 
consecutive nucleotides, or between 12 and 10,000 consecutive nucleotides or between 12 
and 20,000 consecutive nucleotides. 

[0123] The polypeptides of column 1 and 3 from Table II according to the present 
invention and the complexes of these two polypeptides also form part of the present 
invention. More specifically, the polypeptides of SEQ ID NOS. 1 to 7 are part of the present 
invention and their complexes with the polypeptides of Column 3, Table II. 
[0124] In yet another embodiment, the present invention relates to an isolated complex 
of at least two polypeptides encoded by two polynucleotides wherein said two polypeptides 
are associated in the complex by affinity binding and are depicted in columns 1 and 3 of 
Table II. 

[0125] In yet another embodiment, the present invention relates to an isolated complex 
comprising at least a polypeptide as described in column 1 of Table II and a polypeptide as 
described in column 3 of Table II. The present invention is not limited to these polypeptide 
complexes alone but also includes the isolated complex of the two polypeptides in which 
fragments and/or homologous ^polypeptides exhibiting at least 95% sequence identity, as 
well as from 96% sequence identity to 99.999% sequence identity. 

[0126] Also encompassed in another embodiment of the present invention is an isolated 
complex in which SID® of the prey polypeptides encoded by SEQ ID Nos. 15 to 215 in Table 
III form the isolated complex. 

[0127] Besides the isolated complexes described above, nucleic acids coding for a 
Selected Interacting Domain (SID®) polypeptide or a variant thereof or any of the nucleic 
acids set forth in Table III can be inserted into an expression vector which contains the 
necessary elements for the transcription and translation of the inserted protein-coding 
sequence. Such transcription elements include a regulatory region and a promoter. Thus, 
the nucleic acid which may encode a marker compound of the present invention is operably 
linked to a promoter in the expression vector. The expression vector may also include a 
replication origin. 
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[0128] A wide variety of host/expression vector combinations are employed in 
expressing the nucleic acids of the present invention. Useful expression vectors that can be 
used include, for example, segments of chromosomal, non-chromosomal and synthetic DNA 
sequences. Suitable vectors include, but are not limited to, derivatives of SV40 and pcDNA 
and known bacterial plasmids such as col El, pCR1, pBR322, pMal-C2, pET, pGEX as 
described by Smith et al [need cite 1988], pMB9 and derivatives thereof, plasmids such as 
RP4, phage DNAs such as the numerous derivatives of phage I such as NM989, as well as 
other phage DNA such as M13 and filamentous single stranded phage DNA; yeast plasmids 
such as the 2 micron plasmid or derivatives of the 2m plasmid, as well as centomeric and 
integrative yeast shuttle vectors; vectors useful in eukaryotic cells such as vectors useful in 
insect or mammalian cells; vectors derived from combinations of plasmids and phage DNAs, 
such as plasmids that have been modified to employ phage DNA or the expression control 
=P sequences; and the like. 

hi 

[012 9] For example in a baculovirus expression system, both non-fusion transfer 
08 vectors, such as, but not limited to pVL941 (SamHI cloning site Summers, pVL1393 (SamHI, 

Smal, Xba\, EcoRI, A/ofl, Xmalll, Sgrlll and Psfl cloning sites; Invitrogen) pVL1392 (Bgflll, Psfl, 
O Nott, Xmalll, EcoRI, Xbal\, Sma\ and BamVW cloning site; Summers and Invitrogen) and 

FT pBlueBaclll (SamHI, BgH\, Psfl, Nco\ and HincftU cloning site, with blue/white recombinant 

M screening, Invitrogen), and fusion transfer vectors such as, but not limited to, pAc700(SamHI 

O 

and Kpn\ cloning sites, in which the SamHI recognition site begins with the initiation codon; 
Summers), pAc701 and pAc70-2 (same as pAc700, with different reading frames), pAc360 
(SamHI cloning site 36 base pairs downstream of a polyhedrin initiation codon; Invitrogen 
(195)) and pBlueBacHisA, B, C ( three different reading frames with SamHI, Bgh\ % Psfl, Nco\ 
and HindlU cloning site, an N-terminal peptide for ProBond purification and blue/white 
recombinant screening of plaques; Invitrogen (220) can be used. 

[0130] Mammalian expression vectors contemplated for use in the invention include 
vectors with inducible promoters, such as the dihydrofolate reductase promoters, any 
expression vector with a DHFR expression cassette or a DHFR/methotrexate co- 
amplification vector such as pED (Psfl, Sa/I, Sbal, Smal and EcoRI cloning sites, with the 
vector expressing both the cloned gene and DHFR; Kaufman, 1991). Alternatively a 
glutamine synthetase/methionine sulfoximine co-amplification vector, such as pEE14 
(H/ndlll, Xbah y Smal, Sbal, EcoRI and Sc/l cloning sites in which the vector expresses 
glutamine synthetase and the cloned gene; Celltech). A vector that directs episomal 
expression under the control of the Epstein Barr Virus (EBV) or nuclear antigen (EBNA) can 
be used such as pREP4 (SamHI, Sffl, Xho\, Nott, Nhe\, HindlU, Nhe\, PvuW and KprA cloning 
sites, constitutive RSV-LTR promoter, hygromycin selectable marker; Invitrogen) pCEP4 
(SamHI, Sfil, XA?ol, /Vofi, Nhe\, H/ndlll, Nhe\ 9 PvuW and Kpn\ cloning sites, constitutive hCMV 
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immediate early gene promoter, hygromycin selectable marker; Invitrogen), pMEP4 (Kpnl, 
Pvul, Nhel, H/ndlll, A/ofl, Xho\, S//1, SamHI cloning sites, inducible methallothionein Ha gene 
promoter, hygromycin selectable marker, Invitrogen), pREP8 (SamHI, Xho\, A/ofl, H/ndlll, 
Nhel and Kpnl cloning sites, RSV-LTR promoter, histidinol selectable marker; Invitrogen), 
pREP9 (Kpnl, Nhel, H/ndlll, A/ofl, Xhol, Sfi\, SamHI cloning sites, RSV-LTR promoter, G418 
selectable marker; Invitrogen), and pEBVHis (RSV-LTR promoter, hygromycin selectable 
marker, N-terminal peptide purifiable via ProBond resin and cleaved by enterokinase; 
Invitrogen). 

[0131] Selectable mammalian expression vectors for use in the invention include, but 
are not limited to, pRc/CMV (H/ndlll, SsfXI, A/ofl, Sbal and Apal cloning sites, G418 
selection, Invitrogen), pRc/RSV (H/ndll, Spel, BstXl, A/ofl, Xbal cloning sites, G41 8 selection, 
Invitrogen) and the like. Vaccinia virus mammalian expression vectors (see, for example 
Kaufman 1991 that can be used in the present invention include, but are not limited to, 
pSC11 (Smal cloning site, TK- and p-gal selection), pMJ601 (Sail, Smal, AM, Nat\, SspMII, 
SamHI, >4pal, Nhel, Sadl, Kpn\ and H/ndlll cloning sites; TK- and p-gal selection), 
pTKgptFIS (EcoRI, Psfl, Sa/ll, >4ccl, H/ndll, Sbal, SamHI and Hpa cloning sites, TK or XPRT 
selection) and the like. 

[0132] Yeast expression systems that can also be used in the present include, but are 
not limited to, the non-fusion pYES2 vector {Xbal, Sphl, Shol, A/ofl, GstX\, EcoRI, SsfXI, 
SamHI, Sad, Kpnl and H/ndlll cloning sites, Invitrogen), the fusion pYESHisA, B, C (Xfca/I, 
Sphl Shol, A/ofl, SsfXI, EcoRI, SamHI, Sad, Kpnl and H/ndlll cloning sites, N-terminal 
peptide purified with ProBond resin and cleaved with enterokinase; Invitrogen), pRS vectors 
and the like. 

[0133] Consequently, mammalian and typically human cells, as well as bacterial, yeast, 
fungi, insect, nematode and plant cells an used in the present invention and may be 
transfected by the nucleic acid or recombinant vector as defined herein. 
[0134] Examples of suitable cells include, but are not limited to, VERO cells, HELA cells 
such as ATCC No. CCL2, CHO cell lines such as ATCC No. CCL61, COS cells such as 
COS-7 cells and ATCC No. CRL 1650 cells, W138, BHK, HepG2, 3T3 such as ATCC No. 
CRL6361, A549, PC12, K562 cells, 293 cells, Sf9 cells such as ATCC No. CRL1711 and 
Cv1 cells such as ATCC No. CCL70. 

[0135] Other suitable cells that can be used in the present invention include, but are not 
limited to, prokaryotic host cells strains such as Escherichia coli, (e.g., strain DH5-a), 
Bacillus subtilis, Salmonella typhimurium, or strains of the genera of Pseudomonas, 
Streptomyces and Staphylococcus. 

[0136] Further suitable cells that can be used in the present invention include yeast cells 
such as those of Saccharomyces such as Saccharomyces cerevisiae. 
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[0137] Besides the specific isolated complexes, as described above, the present 
invention relates to and also encompasses SID® polynucleotides. As explained above, for 
each bait polypeptide, several prey polypeptides may be identified by comparing and 
selecting the intersection of every isolated fragment that are included in the same 
polypeptide. Thus the SID® polynucleotides of the present invention are represented by the 
shared nucleic acid sequences of SEQ ID Nos. 15 to 215 encoding the SID® polypeptides of 
SEQ ID Nos. 216 to 416 in columns 5 and 7 of Table III, respectively. 

[013 8] The present invention is not limited to the SID® sequences as described in the 
above paragraph, but also includes fragments of these sequences having at least 12 
consecutive nucleic acids, between 12 and 5,000 consecutive nucleic acids and between 12 
and 10,000 consecutive nucleic acids and between 12 and 20,000 consecutive nucleic 
acids, as well as variants thereof. The fragments or variants of the SID® sequences 
possess at least the same affinity of binding to its protein or polypeptide counterpart, against 
which it has been initially selected. Moreover this variant and/or fragments of the SID® 
sequences alternatively can have between 95% and 99.999% sequence identity to its 
protein or polypeptide counterpart. 

[0139] According to the present invention the variants can be created by known 
mutagenesis techniques either in vitro or in vivo. Such a variant can be created such that it 
has altered binding characteristics with respect to the target protein and more specifically 
that the variant binds the target sequence with either higher or lower affinity. 

[0140] Polynucleotides that are complementary to the above sequences which include 
the polynucleotides of the SID®'s, their fragments, variants and those that have specific 
sequence identity are also included in the present invention. 

[0141] The polynucleotide encoding the SID® polypeptide, fragment or variant thereof 
can also be inserted into recombinant vectors which are described in detail above. 

[0142] The present invention also relates to a composition comprising the above- 
mentioned recombinant vectors containing the SID® polypeptides in Table III, fragments or 
variants thereof, as well as recombinant host cells transformed by the vectors. The 
recombinant host cells that can be used in the present invention were discussed in greater 
detail above. 

[0143] The compositions comprising the recombinant vectors can contain physiological 
acceptable carriers such as diluents, adjuvants, excipients and any vehicle in which this 
composition can be delivered therapeutically and can include, but is are not limited to sterile 
liquids such as water and oils. 

[0144] In yet another embodiment, the present invention relates to a method of selecting 
modulating compounds, as well as the modulating molecules or compounds themselves 
which may be used in a pharmaceutical composition. These modulating compounds may 
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act as a cofactor, as an inhibitor, as antibodies, as tags, as a competitive inhibitor, as an 
activator or alternatively have agonistic or antagonistic activity on the protein-protein 
interactions. 

[0145] The activity of the modulating compound does not necessarily, for example, have 
to be 100% activation or inhibition. Indeed, even partial activation or inhibition can be 
achieved that is of pharmaceutical interest. 

[0146] The modulating compound can be selected according to a method which 
comprises: 

[0147] cultivating a recombinant host cell with a modulating compound on a selective 
medium and a reporter gene the expression of which is toxic for said recombinant host cell 
wherein said recombinant host cell is transformed with two vectors: 

[0148] wherein said first vector comprises a polynucleotide encoding a first hybrid 
polypeptide having a DNA binding domain; 

[0149] wherein said second vector comprises a polynucleotide encoding a second 
00 hybrid polypeptide having a transcriptional activating domain that activates said toxic 

"* reporter gene when the first and second hybrid polypeptides interact; 

D [0150] selecting said modulating compound which inhibits or permits the growth of said 

La. 

recombinant host cell. 

H- [0151] Thus, the present invention relates to a modulating compound that inhibits the 

protein-protein interactions between Shigella flexneri polypeptide and human placenta 
polypeptide of columns 1 and 3 of Table II, respectively. The present invention also relates 
to a modulating compound that activates the protein-protein interactions between Shigella 
flexneri polypeptide and human placenta polypeptide of columns 1 and 3 of Table II, 
respectively. 

[0152] In yet another embodiment, the present invention relates to a method of selecting 
a modulating compound, which modulating compound inhibits the interaction between 
Shigella flexneri polypeptide and human placenta polypeptide of columns 1 and 3 of Table II, 
respectively. This method comprises: 

(a) cultivating a recombinant host cell with a modulating compound on a selective 

medium and a reporter gene the expression of which is toxic for said recombinant host cell 
wherein said recombinant host cell is transformed with two vectors: 

(i) wherein said first vector comprises a polynucleotide encoding a first hybrid 
polypeptide having a first domain of an enzyme; 

(ii) wherein said second vector comprises a polynucleotide encoding a second 
hybrid polypeptide having an enzymatic transcriptional activating domain that activates said 
toxic reporter gene when the first and second hybrid polypeptides interact; 
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(b) selecting said modulating compound which inhibits or permits the growth of said 

recombinant host cell. 

[0153] In the two methods described above any toxic reporter gene can be utilized 
including those reporter genes that can be used for negative selection including the URA3 
gene, the CYH1 gene, the CYH2 gene and the like. 

[0154] In yet another embodiment, the present invention provides a kit for screening a 
modulating compound. This kit comprises a recombinant host cell which comprises a 
reporter gene the expression of which is toxic for the recombinant host cell. The host cell is 
transformed with two vectors. The first vector comprises a polynucleotide encoding a first 
hybrid polypeptide having a DNA binding domain; and a second vector comprises a 
polynucleotide encoding a second hybrid polypeptide having a transcriptional activating 
domain that activates said toxic reporter gene when the first and second hybrid polypeptides 
interact. 

[0155] In yet another embodiment a kit is provided for screening a modulating 
compound by providing a recombinant host cell, as described in the paragraph above, but 
instead of a DNA binding domain, the first vector comprises a first hybrid polypeptide 
b containing a first domain of a protein. The second vector comprises a second polypeptide 

containing a second part of a complementary domain of a protein that activates the toxic 
reporter gene when the first and second hybrid polypeptides interact. 
[0156] In the selection methods described above, the activating domain can be p42 Gal 
4, YP16 (HSV) and the DNA-binding domain can be derived from Gal4 or Lex A. The protein 
or enzyme can be adenylate cyclase, guanylate cyclase, DHFR and the like. 
[0157] Examples of modulating compounds are set forth in Table III. 
[0158] In yet another embodiment, the present invention relates to a pharmaceutical 
composition comprising the modulating compounds for preventing or treating bacillary 
dysentery in a human or animal, most preferably in a mammal. 

[0159] This pharmaceutical composition comprises a pharmaceutical^ acceptable 
amount of the modulating compound. The pharmaceutical^ acceptable amount can be 
estimated from cell culture assays. For example, a dose can be formulated in animal 
models to achieve a circulating concentration range that includes or encompasses a 
concentration point or range having the desired effect in an in vitro system. This information 
can thus be used to accurately determine the doses in other mammals, including humans 
and animals. 

[0160] The therapeutically effective dose refers to that amount of the compound that 
results in amelioration of symptoms in a patient. Toxicity and therapeutic efficacy of such 
compounds can be determined by standard pharmaceutical procedures in cell cultures or in 
experimental animals. For example, the LD50 (the dose lethal to 50% of the population) as 
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well as the ED50 (the dose therapeutically effective in 50% of the population) can be 
determined using methods known in the art. The dose ratio between toxic and therapeutic 
effects is the therapeutic index which can be expressed as the ratio between LD 50 and 
ED50 compounds that exhibit high therapeutic indexes. 

[0161] The data obtained from the cell culture and animal studies can be used in 
formulating a range of dosage of such compounds which lies preferably within a range of 
circulating concentrations that include the ED50 with little or no toxicity. 
[0162] The pharmaceutical composition can be administered via any route such as 
locally, orally, systemically, intravenously, intramuscularly, mucosally, using a patch and can 
be encapsulated in liposomes, microparticles, microcapsules, and the like. The 
pharmaceutical composition can be embedded in liposomes or even encapsulated. 
[0163] Any pharmaceutical^ acceptable carrier or adjuvant can be used in the 
pharmaceutical composition. The modulating compound will be preferably in a soluble form 
combined with a pharmaceutical^ acceptable carrier. The techniques for formulating and 
administering these compounds can be found in "Remington's Pharmaceutical Sciences? 
Mack Publication Co., Easton, PA, latest edition. 

[0164] The mode of administration optimum dosages and galenic forms can be 
determined by the criteria known in the art taken into account the seriousness of the general 
condition of the mammal, the tolerance of the treatment and the side effects. 
[0165] The present invention also relates to a method of treating or preventing bacillary 
dysentery in a human or mammal in need of such treatment. This method comprises 
administering to a mammal in need of such treatment a pharmaceutical^ effective amount of 
a modulating compound which binds to a targeted Shigella protein. In a preferred 
embodiment, the modulating compound is a polynucleotide which may be placed under the 
control of a regulatory sequence which is functional in the mammal or human. 
[0166] In yet another embodiment, the present invention relates to a pharmaceutical 
composition comprising a SID® polypeptide, a fragment or variant thereof. The SID® 
polypeptide, fragment or variant thereof can be used in a pharmaceutical composition 
provided that it is endowed with highly specific binding properties to a bait polypeptide of 
interest. 
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[0167] The original properties of the SID® polypeptide or variants thereof interfere with 
the naturally occurring interaction between a first protein and a second protein within the 
cells of the organism. Thus, the SID® polypeptide binds specifically to either the first 
polypeptide or the second polypeptide. 

[0168] Therefore, the SID® polypeptides of the present invention or variants thereof 
interfere with protein-protein interactions between Shigella or Escherichia polypeptides or 
between a mammal polypeptide. 

[0169] Thus, the present invention relates to a pharmaceutical composition comprising a 
pharmaceutical^ acceptable amount of a SID® polypeptide or variant thereof, provided that 
the variant has the above-mentioned two characteristics; i.e., that it is endowed with highly 

U specific binding properties to a bait polypeptide of interest and is devoid of biological activity 

2 of the naturally occurring protein. 

i= [0170] In yet another embodiment, the present invention relates to a pharmaceutical 

4! composition comprising a pharmaceutical^ effective amount of a polynucleotide encoding a 

5 SID® polypeptide or a variant thereof wherein the polynucleotide is placed under the control 

^ of an appropriate regulatory sequence. Appropriate regulatory sequences that are used are 

□ polynucleotide sequences derived from promoter elements and the like. 

H 1 [0171] Polynucleotides that can be used in the pharmaceutical composition of the 

H present invention include the nucleotide sequences of SID®s of SEQ ID Nos. 1 5 to 21 5. 

[0172] Besides the SID® polypeptides and polynucleotides, the pharmaceutical 
composition of the present invention can also include a recombinant expression vector 
comprising the polynucleotide encoding the SID® polypeptide, fragment or variant thereof. 
[0173] The above described pharmaceutical compositions can be administered by any 
route such as orally, systemically, intravenously, intramuscularly, intradermal^, mucosally, 
encapsulated, using a patch and the like. Any pharmaceutical^ acceptable carrier or 
adjuvant can be used in this pharmaceutical composition. 

[0174] The SID® polypeptides as active ingredients will be preferably in a soluble form 
combined with a pharmaceutical^ acceptable carrier. The techniques for formulating and 
administering these compounds can be found in "Remington's Pharmaceutical Sciences" 
supra. 

[0175] The amount of pharmaceutical^ acceptable SID® polypeptides can be 
determined as described above for the modulating compounds using cell culture and animal 
models. 

[0176] Such compounds can be used in a pharmaceutical composition to treat or 
prevent bacillary dysentery. 

[0177] Thus, the present invention also relates to a method of preventing or treating 
bacillary dysentery in a mammal said method comprising the steps of administering to a 
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mammal in need of such treatment a pharmaceutical^ effective amount of a recombinant 
expression vector comprising a polynucleotide encoding a SID® polypeptide which binds to 
a either to a Shigella flexneri protein or to a human placenta protein involved in a protein- 
protein interaction between a Shigella flexneri protein and an human placenta protein. More 
specifically, the present invention relates to a method of preventing or treating bacillary 
dysentery in a mammal said method comprising the steps of administering to a mammal in 
need of such treatment a pharmaceutically effective amount of: 

(1) a SID® polypeptide of SEQ ID Nos. 216 to 416 or a variant thereof which binds to a 
targeted Shigella flexneri protein or human placenta protein; or 

(2) a SID® polynucleotide encoding a SID® polypeptide of SEQ ID Nos. 15 to 215 or a 
variant or a fragment thereof wherein said polynucleotide is placed under the control of a 
regulatory sequence which is functional in said mammal; or 

(3) a recombinant expression vector comprising a polynucleotide encoding a SID® 
polypeptide which binds either to a Shigella flexneri protein or to a human placenta protein 
involved in a protein-protein interaction between a Shigella flexneri protein and an human 
placenta protein. 

[0178] In another embodiment the present invention nucleic acids comprising a 
sequence of SEQ ID Nos. 15 to 215 which encodes the protein of sequence SEQ ID Nos. 
216 to 416 and/or functional derivatives thereof are administered to modulate complex ( from 
Table II) function by way of gene therapy. Any of the methodologies relating to gene therapy 
available within the art may be used in the practice of the present invention such as those 
described by Goldspiel et al Clin. Pharm. 12 pgs. 488-505 (1993). 

[0179] Delivery of the therapeutic nucleic acid into a patient may be direct in vivo gene 
therapy (i.e., the patient is directly exposed to the nucleic acid or nucleic acid-containing 
vector) or indirect ex vivo gene therapy (i.e., cells are first transformed with the nucleic acid 
in vitro and then transplanted into the patient). 

[0180] For example for in vivo gene therapy, an expression vector containing the nucleic 
acid is administered in such a manner that it becomes intracellular; i.e., by infection using a 
defective or attenuated retroviral or other viral vectors as described, for example in U.S. 
Patent 4,980,286 or by Robbins et al, Pharmacol. Then , 80 No. 1 pgs. 35-47 (1998). 

[0181] The various retroviral vectors that are known in the art are such as those 
described in Miller et al, Meth. Enzymol. 217 pgs. 581-599 (1993) which have been modified 
to delete those retroviral sequences which are not required for packaging of the viral 
genome and subsequent integration into host cell DNA. Also adenoviral vectors can be 
used which are advantageous due to their ability to infect non-dividing cells and such high- 
capacity adenoviral vectors are described in Kochanek, Human Gene Therapy, 10, pgs. 
2451-2459 (1999). Chimeric viral vectors that can be used are those described by Reynolds 
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et al, Molecular Medecine Today, pgs. 25 -31 (1999). Hybrid vectors can also be used and 
are described by Jacoby et al, Gene Therapy, 4, pgs. 1282-1283 (1997). 
[0182] Direct injection of naked DNA or through the use of microparticle bombardment 
(e.g., Gene Gun®; Biblistic, Dupont). or by coating it with lipids can also be used in gene 
therapy. Cell-surface receptors/transfecting agents or through encapsulation in liposomes, 
microparticles or microcapsules or by administering the nucleic acid in linkage to a peptide 
which is known to enter the nucleus or by administering it in linkage to a ligand predisposed 
to receptor-mediated endocytosis ( See, Wu & Wu, J. Biol. Chem., 262 pgs. 4429-4432 ( 
1987)) can be used to target cell types which specifically express the receptors of interest. 
[0183] In another embodiment a nucleic acid ligand compound may be produced in 
which the ligand comprises a fusogenic viral peptide designed so as to disrupt endosomes, 
thus allowing the nucleic acid to avoid subsequent lysosomal degradation. The nucleic acid 
may be targeted in vivo for cell specific endocytosis and expression by targeting a specific 
receptor such as that described in WO92/06180, W093/14188 and WO 93/20221. 
CO Alternatively the nucleic acid may be introduced intracellular^ and incorporated within the 

host cell genome for expression by homologous recombination. See, Zijlstra et al, Nature, 
342, pgs. 435-428 (1989). 

[0184] In ex vivo gene a gene is transferred into cells in vitro using tissue culture and 
the cells are delivered to the patient by various methods such as injecting subcutaneously, 
application of the cells into a skin graft and the intravenous injection of recombinant blood 
cells such as hematopoietic stem or progenitor cells. 

[0185] Cells into which a nucleic acid can be introduced for the purposes of gene 
therapy include, for example, epithelial cells, endothelial cells, keratinocytes, fibroblasts, 
muscle cells, hepatocytes and blood cells. The blood cells that can be used include, for 
example, T-lymphocytes, B-lymphocytes, monocytes, macrophages, neutrophils, 
eosinophils, megakaryocytes, granulocytes, hematopoietic cells or progenitor cells and the 
like. 

[0186] In yet another embodiment the present invention relates to protein chips or 
protein microarrays. It is well known in the art that microarrays can contain more than 
10,000 spots of a protein that can be robotically deposited on a surface of a glass slide or 
nylon filter. The proteins attach covalently to the slide surface, yet retain their ability to 
interact with other proteins or small molecules in solution. In some instances the protein 
samples can be made to adhere to glass slides by coating the slides with an aldehyde- 
containing reagent that attaches to primary amines. A process for creating microarrays is 
described, for example by MacBeath and Schreiber in Science, Volume 289, Number 5485, 
pgs, 1760-1763 (2000) or Service, Science, Vol, 289, Number 5485 pg. 1673 (2000). An 



M 



m 



25 



Pi 



4- 



apparatus for controlling, dispensing and measuring small quantities of fluid is described, for 
example, in U.S. Patent No. 6,112,605. 

[0187] The present invention also provides a record of protein-protein interactions, 
PIM®'s, SID®'s and any data encompassed in the following Tables. It will be appreciated 
that this record can be provided in paper or electronic or digital form. 

[018 8] In order to fully illustrate the present invention and advantages thereof, the 
following specific examples are given, it being understood that the same are intended only 
as illustrative and in no way limitative. 
EXAMPLES 

EXAMPLE 1 : Preparation of a collection of random-primed cDNA fragments 
1 .A. Collection preparation and transformation in Escherichia coli 
1 .A.1 . Random-primed cDNA fragment preparation 

[0189] For the human placenta mRNA sample, random-primed cDNA was prepared 
from 5 |ig of polyA+ mRNA using a TimeSaver cDNA Synthesis Kit (Amersham Pharmacia 
Biotech) and with 5 \xg of random N9-mers according to the manufacturer's instructions. 
Following phenolic extraction, the cDNA was precipitated and resuspended in water. The 
p resuspended cDNA was phosphorylated by incubating in the presence of T4 DNA Kinase 

t (Biolabs) and ATP for 30 minutes at 37°C. The resulting phosphorylated cDNA was then 

^ purified over a separation column (Chromaspin TE 400, Clontech), according to the 

O 

ry manufacturer's protocol. 

1 .A.2. Ligation of linkers to blunt-ended cDNA 

Oligonucleotide HGX931 (5' end phosphorylated) 1 ng/|il and HGX932 1jng/|xl. 
Sequence of the oligo HGX931 : 5'-GGGCCACGAA-3' (SEQ ID NO. 417) 
Sequence of the oligo HGX932 : 5'-TTCGTGGCCCCTG-3' (SEQ ID NO. 418) 
[0190] Linkers were preincubated (5 minutes at 95°C, 10 minutes at 68°C, 15 minutes at 
42°C) then cooled down at room temperature and ligated with cDNA fragments at 16°C 
overnight. 

[0191] Linkers were removed on a separation column (Chromaspin TE 400, Clontech), 
according to the manufacturer's protocol. 
1 .A.3. Vector preparation 

[0192] Plasmid pP6 (see Figure 10) was prepared by replacing the Spell Xhol fragment 
of pGAD3S2X with the double-stranded oligonucleotide: 

5'CTAGCCATGGCCGCAGGGGCCGCGGCCGCACTAGTGGGGATCCTTAATTAAAGGGC 
CACTGGGGCCCCC 

GGTACCGGCGTCCCCGGCGCCGGCGTGATCACCCCTAGGAATTAATTTCCCGGTGAC 
CCCGGGGGAGCT 3' (SEQ ID NO. 419) 
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[0193] The pP6 vector was successively digested with Sf/1 and BamH\ restriction 
enzymes (Biolabs) for 1 hour at 37°C, extracted, precipitated and resuspended in water. 
Digested plasmid vector backbones were purified on a separation column 
(Chromaspin TE 400, Clontech), according to the manufacturer's protocol. 
1 .A. 4. Ligation between vector and insert of cDNA 

[0194] The prepared vector was ligated overnight at 15°C with the blunt-ended cDNA 
described in section 2 using T4 DNA ligase (Biolabs). The DNA was then precipitated and 
resuspended in water. 

1.A.5. Library transformation in Escherichia coli 

[0195] The DNA from section 1.A.4 was transformed into Electromax DH10B 
electrocompetent cells (Gibco BRL) with a Cell Porator apparatus (Gibco BRL). 1 ml SOC 
medium was added and the transformed cells were incubated at 37°C for 1 hour. 9 mis of 
SOC medium per tube was added and the cells were plated on LB+ampicillin medium. The 
^ colonies were scraped with liquid LB medium, aliquoted and frozen at -80°C. 

m [0196] The obtained collection of recombinant cell clones is named HGXBPLARP1. 

^ 1 .B. Collection transformation in Saccharomyces cerevisiae 

O [0197] The Saccharomyces cerevisiae strain (Y187 (MATa Gal4A Gal80A ade2-101, 

his3, leu2-3, -1 12, trp1-901 , ura3-52 URA3::UASGAL1-LacZ Met)) was transformed with the 
M cDNA library. 

pj [0198] The plasmid DNA contained in E. coli were extracted (Qiagen) from aliquoted E. 

coli frozen cells (1 .A.5.). Saccharomyces cerevisiae yeast Y187 in YPGIu were grown. 

[0199] Yeast transformation was performed according to standard protocol (Giest et at. 
Yeast, 11, 355-360, 1995) using yeast carrier DNA (Clontech). This experiment leads to 10 4 
to 5 x 10 4 cells/jLig DNA. 2 x 10 4 cells were spread on DO-Leu medium per plate. The cells 
were aliquoted into vials containing 1 ml of cells and frozen at -80°C. 

[02 0 0] The obtained collection of recombinant cell clones is named HGXYPLARP1 
(placenta). 

1.C. Construction of bait plasmids 

[02 01] For fusions of the bait protein (listed in Table II) to the DNA-binding domain of the 
GAL4 protein of S. cerevisiae, bait fragments were cloned into plasmid pB6. For fusions of 
the bait protein to the DNA-binding domain of the LexA protein of E. coli, bait fragments were 
cloned into plasmid pB20. 

[02 02] Plasmid pB6 (see Figure 3) was prepared by replacing the NcoMSah polylinker 
fragment of pASAA with the double-stranded DNA fragment: 
5' 

CATGGCCGGACGGGCCGCGGCCGCACTAGTGGGGATCCTTAATTAAAGGGCCACTGG 
GGCCCCC 3' (SEQ ID NO. 420) 
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3' 

CGGCCTGCCCGGCGCCGGCGTGATCACCCCTAGGAATTAATTTCCCGGTGACCCCGG 
GGGAGCT 5' (SEQ ID NO. 421) 

[02 03] Plasmid pB20 (see Figure 6) was prepared by replacing the EcoRI Pstl polylinker 
fragment of pLexlO with the double-stranded DNA fragment: 
5' 

AATTCGGGGCCGGACGGGCCGCGGCCGCACTAGTGGGGATCCTTAATTAAGGGCCAC 

TGGGGCCCCTCGACCTGCA 3' (SEQ ID NO. 422) 

3' 

GCCCCGGCCTGCCCGGCGCCGGCGTGATCACCCCTAGGAATTAATTCCCGGTGACCC 
CGGGGAGCTGG 5' (SEQ ID NO. 423) 

[0204] The amplification of the bait ORF was obtained by PCR using the Pfu proof- 
reading Taq polymerase (Stratagene), 10pmol of each specific amplification primer and 
200 ng of plasmid DNA as template. 

[02 05] The PCR program was set up as follows : 



94° 45" 
94° 45" 

48° 45" x30 cycles 

7:>° 6' 
72° 10' 
15° oo 

[02 06] The amplification was checked by agarose gel electrophoresis. 
[02 07] The PCR fragments were purified with Qiaquick column (Qiagen) according to 
the manufacturer's protocol. 

[0208] Purified PCR fragments were digested with adequate restriction enzymes. The 
PCR fragments were purified with Qiaquick column (Qiagen) according to the manufacturer's 
protocol. 

[020 9] The digested PCR fragments were ligated into an adequately digested and 
dephosphorylated bait vector (pB6 or pB20) according to standard protocol (Sambrook etal.) 
and were transformed into competent bacterial cells. The cells were grown, the DNA 
extracted and the plasmid was sequenced. 

Example 2 : Screening the collection with the two-hybrid in yeast system 
2.A. The mating protocol 

[0210] The mating two-hybrid in yeast system (as described by Legrain et al., Nature 
Genetics, vol. 16, 277-282 (1997), Toward a functional analysis of the yeast genome through 
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exhaustive two-hybrid screens) was used for its advantages but one could also screen the 
cDNA collection in classical two-hybrid system as described in Fields et al. or in a yeast 
reverse two-hybrid system. 

[0211] The mating procedure allows a direct selection on selective plates because the 
two fusion proteins are already produced in the parental cells. No replica plating is required. 
[0212] This protocol was written for the use of the library transformed into the Y187 
strain. 

[0213] For bait proteins fused to the DNA-binding domain of GAL4, bait-encoding 
plasmids were first transformed into S. cerevisiae (CG1945 strain (MATa Gal4-542 GaM 80- 
538 ade2-101 his3A200, Ieu2-3,112, trp1-901, ura3-52, Iys2-801, URA3::GAL4 17mers (X3)- 
CyC1 TATA-LacZ, LYS2::GAL1 UAS-GAL1 TATA-HIS3 CYH R )) according to step 1.B. and 
spread on DO-Trp medium. 

[0214] For bait proteins fused to the DNA-binding domain of LexA, bait-encoding 
plasmids were first transformed into S. cerevisiae (L40Agal4 strain (MATa ade2, trp1-901, 
Ieu2 3,112, Iys2-801, his3A200, LYS2::(lexAop) 4 -HIS3, ura3-52::URA3 (lexAop) 8 -LacZ, 

* GAL4::Kan R )) according to step 1 .B. and spread on DO-Trp medium. 

Q 

Day 1, morning : preculture 

M 5 [0215] The cells carrying the bait plasmid obtained at step 1.C. were precultured in 

jpj 20 ml DO-Trp medium and grown at 30°C with vigorous agitation. 

FtJ Day 1, late afternoon : culture 

[0216] The OD 600nm of the DO-Trp pre-culture of cells carrying the bait plasmid pre- 
culture was measured. The OD 60 onm must lie between 0.1 and 0.5 in order to correspond to a 
linear measurement.50 ml DO-Trp at OD600nm 0.006/ml was inoculated and grown 
overnight at 30°C with vigorous agitation. 
Day 2 : mating 
medium and plates 

1 YPGIu 15cm plate 

50 ml tube with 13 ml DO-Leu-Trp-His 
100 ml flask with 5 ml of YPGIu 
8 DO-Leu-Trp-His plates 

2 DO-Leu plates 
2 DO-Trp plates 

2 DO-Leu-Trp plates 

[0217] The OD600nm of the DO-Trp culture was measured. It should be around 1. 
[0218] For the mating, twice as many bait cells as library cells were used. To get a good 
mating efficiency, one must collect the cells at 10 8 cells per cm 2 . 
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[0219] The amount of bait culture (in ml) that makes up 50 OD600nm units for the 
mating with the prey library was estimated. 

[0220] A vial containing the HGXYCDNA1 library was thawed slowly on ice. 1.0ml of the 
vial was added to 5 ml YPGIu. Those cells were recovered at 30°C, under gentle agitation 
for 10 minutes. 
Mating 

[0221] The 50 OD600nm units of bait culture was placed into a 50 ml falcon tube. 
[0222] The HGXYCDNA1 library culture was added to the bait culture, then centrifuged, 
the supernatant discarded and resuspended in 1.6ml YPGIu medium. 

[022 3] The cells were distributed onto two 15cm YPGIu plates with glass beads. The 
cells were spread by shaking the plates. The plate cells-up at 30°C for 4h30min were 
incubated. 

Collection of mated cells 

[0224] The plates were washed and rinsed with 6ml and 7ml respectively of DO-Leu- 
Trp-His. Two parallel serial ten-fold dilutions were performed in 500|jl DO-Leu-Trp-His up to 
1/10,000. 50[il of each 1/10000 dilution was spread onto DO-Leu and DO-trp plates and 
50|il of each 1/1000 dilution onto DO-Leu-Trp plates. 22.4ml of collected cells were spread 
in 400|il aliquots on DO-Leu-Trp-His+Tet plates. 
Day 4 

[022 5] Clones that were able to grow on DO-Leu-Trp-His+Tetracyclin were then 
selected. This medium allows one to isolate diploid clones presenting an interaction. 
[022 6] The His+ colonies were counted on control plates. 

[0227] The number of His+ cell clones will define which protocol is to be processed : 
[0228] Upon 60.1 0 6 Trp+Leu+ colonies : 

- if the number His+ cell clones <285 : then use the process luminometry protocol on all 
colonies 

- if the number of His+ cell clones > 285 and <5000: then process via overlay and then 
luminometry protocols on blue colonies (2.B and 2.C). 

- if number of His+ cell clones >5000 : repeat screen using DO-Leu-Trp-His+Tetracyclin 
plates containing 3-aminotriazoL 

2.B. The X-Gal overlay assay 

[022 9] The X-Gal overlay assay was performed directly on the selective medium plates 
after scoring the number of His + colonies. 
Materials 

[023 0] A waterbath was set up. The water temperature should be 50°C. 
0.5 M Na 2 HP0 4 pH 7.5. 
1 .2% Bacto-agar. 
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2% X-Gal in DMF. 

Overlay mixture : 0.25 M Na 2 HP0 4 pH7.5, 0.5% agar, 0.1% SDS, 7% DMF (LABOSI), 0.04% 
X-Gal (ICN). For each plate, 10 ml overlay mixture are needed. 
DO-Leu-Trp-His plates. 
Sterile toothpicks. 
Experiment 

[02 31] The temperature of the overlay mix should be between 45°C and 50°C. The 
overlay-mix was poured over the plates in portions of 10 ml. When the top layer was settled, 
they were collected. The plates were incubated overlay-up at 30°C and the time was noted. 
Blue colonies were checked for regularly. If no blue colony appeared, overnight incubation 
p was performed. Using a pen the number of positives was marked. The positives colonies 

were streaked on fresh DO-Leu-Trp-His plates with a sterile toothpick. 

2. C. The luminometry assay 

[0232] His+ colonies were grown overnight at 30°C in microtiter plates containing DO- 
Leu-Trp-His+Tetracyclin medium with shaking. The day after, the overnight culture was 
diluted 15 times into a new microtiter plate containing the same medium and was incubated 
for 5 hours at 30°C with shaking. The samples were diluted 5 times and read OD 600n m. The 
K samples were diluted again to obtain between 10,000 and 75,000 yeast cells/well in 100 jxl 

final volume. 

[0233] Per well, 76 jal of One Step Yeast Lysis Buffer (Tropix) was added, 20 |xl 
Sapphirell Enhancer (Tropix), 4 jlxI Galacton Star (Tropix) and incubated 40 minutes at 30°C. 
The p-Gal read-out (L) was measured using a Luminometer (Trilux, Wallach). The value of 
(OD 60 onmX L) was calculated and interacting preys having the highest values were selected. 
[0234] At this step of the protocol, diploid cell clones presenting interaction were 
isolated. The next step was now to identify polypeptides involved in the selected interactions. 
Example 3 : Identification of positive clones 

3. A. PCR on yeast colonies 
Introduction 

[0235] PCR amplification of fragments of plasmid DNA directly on yeast colonies is a 
quick and efficient procedure to identify sequences cloned into this plasmid. It is directly 
derived from 

[0236] a published protocol (Wang H. et al., Analytical Biochemistry, 237, 145-146, 
(1996)). However, it is not a standardized protocol and it varies from strain to strain and it is 
dependent of experimental conditions (number of cells, Taq polymerase source, etc). This 
protocol should be optimized to specific local conditions. 
Materials 

[0237] For 1 well, PCR mix composition was : 
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32.5 |il water, 

5 ^1 10X PCR buffer (Pharmacia), 
1 |il dNTP 10 mM, 

0.5 jal Taq polymerase (5u/^l) (Pharmacia), 

0.5 oligonucleotide ABS1 10 pmole/^l: 5'-GCGTTTGGAATCACTACAGG-3',(SEQ ID NO. 
424) 

0.5 jal oligonucleotide ABS2 10pmole/^: 5'-CACGATGCACGTTGAAGTG-3'.(SEQ ID NO. 
425) 

1 N NaOH. 
Experiment 

[023 8] The positive colonies were grown overnight at 30°C on a 96 well cell culture 
cluster (Costar), containing 150 jal DO-Leu-Trp-His+Tetracyclin with shaking. The culture 
was resuspended and 100 nl was transferred immediately on a Thermowell 96 (Costar) and 
centrifuged for 5 minutes at 4,000 rpm at room temperature. The supernatant was removed. 
SI 5 jil NaOH was added to each well and shaken for 1 minute. 

p [02 3 9] The Thermowell was placed in the thermocycler (GeneAmp 9700, Perkin Elmer) 

M 1 for 5 minutes at 99.9°C and then 10 minutes at 4°C. In each well, the PCR mix was added 

and shaken well. 

[024 0] The PCR program was set up as followed : 

94°C 3 minutes 

94°C 30 seconds 

53°C 1 minute 30 seconds x 35 cycles 

72°C 3 minutes 

72°C 5 minutes 

15°C oo 

[02 41] The quality, the quantity and the length of the PCR fragment was checked on an 
agarose gel. The length of the cloned fragment was the estimated length of the PCR 
fragment minus 300 base pairs that corresponded to the amplified flanking plasmid 
sequences. 

[0242] 3.B. Plasmids rescue from yeast by electroporation 
Introduction 

[0243] The previous protocol of PCR on yeast cell may not be successful, in such a 
case, plasmids from yeast by electroporation can be rescued. This experiment allows the 
recovery of prey plasmids from yeast cells by transformation of E. coli with a yeast cellular 
extract. The prey plasmid can then be amplified and the cloned fragment can be sequenced. 
Materials 



tress: 
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[0244] Plasmid rescue 

Glass beads 425-600 jim (Sigma)Phenol/chloroform (1/1) premixed with isoamyl alcohol 
(Amresco) 

Extraction buffer : 2% Triton X100, 1% SDS, 100 mM NaCI, 10 mM TrisHCI pH 8.0, 1 mM 
EDTA pH 8.0. 

Mix ethanol/NH 4 Ac : 6 volumes ethanol with 7.5 M NH 4 Acetate, 70% Ethanol and yeast cells 

in patches on plates. 

Electroporation 

SOC medium 

M9 medium 

jpf Selective plates : M9-Leu+Ampicillin 

p 2 mm electroporation cuvettes (Eurogentech) 

*! Experiment 

Plasmid rescue 

[0245] The cell patch on DO-Leu-Trp-His was prepared with the cell culture of section 
2.C. The cell of each patch was scraped into an Eppendorf tube, 300 jal of glass beads was 
added in each tube, then, 200 jutl extraction buffer and 200 ^il phenol:chloroform:isoamyl 
H= alcohol (25:24:1) was added. 

™ [024 6] The tubes were centrifuged for 10 minutes at 15,000 rpm. 

fy [0247] 180 jal supernatant was transferred to a sterile Eppendorf tube and 500 jul each of 

ethanol/NH 4 Ac was added and the tubes were vortexed. The tubes were centrifuged for 15 
minutes at 15,000 rpm at 4°C. The pellet was washed with 200 ^l 70% ethanol and the 
ethanol was removed and the pellet was dried. The pellet was resuspended in 10 water. 
Extracts were stored at -20°C. 
Electroporation 
Materials : 

[024 8] Electrocompetent MC1066 cells prepared according to standard protocols 
(Sambrook et al. supra). 

1 jul of yeast plasmid DNA-extract was added to a pre-chilled Eppendorf tube, and kept on 
ice. 

1 |il plasmid yeast DNA-extract sample was mixed and 20 \x\ electrocompetent cells was 
added and transferred in a cold electroporation cuvette.Set the Biorad electroporator on 200 
ohms resistance, 25 jnF capacity; 2.5 kV. Place the cuvette in the cuvette holder and 
electroporate. 

1 ml of SOC was added into the cuvette and the cell-mix was transferred into a sterile 
Eppendorf tube. The cells were recovered for 30 minutes at 37°C, then spun down for 
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1 minute at 4,000 x g and the supernatant was poured off. About 100 \i\ medium was kept 
and used to resuspend the cells and spread them on selective plates (e.g., M9-Leu plates). 
The plates were then incubated for 36 hours at 37°C. 

[024 9] One colony was grown and the plasmids were extracted. Check for the presence 
and size of the insert through enzymatic digestion and agarose gel electrophoresis. The 
insert was then sequenced. 
Example 4 : Protein-protein interaction 

[02 50] For each bait, the previous protocol leads to the identification of prey 
polynucleotide sequences. Using a suitable software program (e.g., Blastwun, available on 
the Internet site of the University of Washington 
http://bioweb.pasteur.fr/seqanal/interfaces/blastwu.html ) the identity of the mRNA transcript 
that is encoded by the prey fragment may be determined and whether the fusion protein 
encoded is in the same open reading frame of translation as the predicted protein or not. 
[02 51] Alternatively, prey nucleotide sequences can be compared with one another and 
those which share identity over a significant region (60nt) can be grouped together to form a 
contiguous sequence (Contig) whose identity can be ascertained in the same manner as for 
individual prey fragments described above. 
Example 5 : Identification of SID® 

[0252] By comparing and selecting the intersection of all isolated fragments that are 
included in the same polypeptide, one can define the Selected Interacting Domain (SID®) as 
illustrated in Figure 15. The SID® is illustrated in Table III . 
Example 6 : Identification of PIM® 

[0253] The PIM® is then constructed using methods known in the art as exemplified in 
Figure 16. 

Example 7 : Making of polyclonal and monoclonal antibodies 

[0254] The protein-protein complex of columns 1 and 3 of Table II was injected into mice 
and polyclonal and monoclonal antibodies were made following the procedure set forth in 
Sambrook et al. (supra). 

[0255] More specifically, mice are immunized with an immunogen comprising Table II 
complexes conjugated to keyhole limpet hemocyanin using glutaraldehyde or EDC as is well 
known in the art. The complexes can also be stabilized by crosslinking as described in WO 
00/37483. The immunogen is then mixed with an adjuvant. Each mouse receives four 
injections of 10 ug to 100 ug of immunogen, and after the fourth injection, blood samples are 
taken from the mice to determine if the serum contains antibodies to the immunogen. Serum 
titer is determined by ELISA or RIA. Mice with sera indicating the presence of antibody to 
the immunogen are selected for hybridoma production. 
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[02 56] Spleens are removed from immune mice and single-cell suspension is prepared 
(Harlow et al 1988). Cell fusions are performed essentially as described by Kohler et al 
(1976). Briefly, P365.3 myeloma cells (ATTC Rockville, Md) or NS-1 myeloma cells are 
fused with spleen cells using polyethylene glycol as described by Harlow et al (1989). Cells 
are plated at a density of 2 x 10 5 cells/well in 96-well tissue culture plates. Individual wells 
are examined for growth and the supernatants of wells with growth are tested for the 
presence of the complex-specific antibodies by ELISA or RIA using one of the proteins set 
forth in Table II as a target protein. Cells in positive wells are expanded and subcloned to 
establish and confirm monoclonality. 

[0257] Clones with the desired specificities are expanded and grown as ascites in mice 
or in a hollow fiber system to produce sufficient quantities of antibodies for characterization 
and assay development. Antibodies are tested for binding to one of the proteins in Table II, 
to determine which are specific for the Table II complexes as opposed to those that bind to 
the individual proteins. More specifically, antibodies are tested for binding to bait polypeptide 
of column 1 of Table II alone or to prey polypeptide of column 3 of Table II alone, to 
determine which are specific for the protein-protein complex of columns 1 and 3 of Table II 
as opposed to those that bind to the individual proteins. 
L [02 58] Monoclonal antibodies against each of the complexes set forth in columns 1 and 

3 of Table II are prepared in a similar manner by mixing specified proteins together, 
immunizing an animal, fusing spleen cells with myeloma cells and isolating clones which 
produce antibodies specific for he protein complex, but not for individual proteins. 
Example 8: Modulating compounds/PIM screening 

[02 59] Each specific protein-protein complex of columns 1 and 3 of Table II may be used 
to screen for modulating compounds. 

[02 60] One appropriate construction for this modulating compound screening may be: 

- bait polynucleotide inserted in pB6 or pB20;- prey polynucleotide inserted in 

pP6; 

- transformation of these two vectors in a permeable yeast cell; 

- growth of the transformed yeast cell on medium containing compound to be tested; 

- and observation of the growth of the yeast cells. 

[0261] The following results obtained from these Examples, as well as the teachings in 
the specification are set forth in the Tables below. 

[02 62] While the invention has been described in terms of the various preferred 
embodiments, the skilled artisan will appreciate that various modifications, substitutions, 
omissions and changes may be made without departing from the scope thereof. 
Accordingly, it is intended that the present invention be limited by the scope of the following 
claims, including equivalents thereof. 
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[02 63] All patent and non-patent publications cited in this specification, including 
the websites set forth onpages 8, 13 and 33, are indicative of the level of skill of 
those skilled in the art to which this invention pertains. All these publications and 
patent applications are herein incorporated by reference to the same extent as if 
each individual publication or patent application was specifically and individually 
indicated to be incorporated herein by reference. 
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CO 


CCGTAGAGGACCTGACAACAACCCTCAACGAGGCAGCCAGTGCTGCTGGGG 

TCGTGGGTGGCATGGTGGACTCCATCACCCAGGCCATCAACCAGCTAGATG 

AAGGACCAATGGGTGAACCAGAAGGTTCCTTCGTGGATTACCAAACAACTAT 

GGTGCGGACAGCCAAGGCCATTGCAGTGACCGTTCAGGAGATGGTTACCAA 

GTCAAACACCAGCCCAGAGGAGCTGGGCCCTCTTGCTAACCAGCTGACCAG 

TGACTATGGCCGTCTGGCCTCGGAGGCCAAGCCTGCAGCGGTGGCTGCTG 

AAAATGAAGAGATAGGTTCCCATATCAAACACCGGGTACAGGAGCTGGGCC 

ATGGCTGTGCCGCTCTGGTCACCAAGGCAGGCGCCCTGCAGTGCAGCCCC 

AGTGATGCCTACACCAAGAAGGAGCTCATAGAGTGTGCCCGGAGAGTCTCT 

GAGAAGGTCTCCCACGTCCTGGCTGCGCTCCAGGCTGGGAATCGTGGCACC 

CAGGCCTGCATCACAGCAGCCAGCGCTGTGTCTGGTATCATTGCTGACCTC 

GACACCACCATCATGTTCGCCACTGCTGGCACGCTCAATCGTGAGGGTACT 

GAAACTTTCGCTGACCACCGGGAGGGCATCCTGAAGACTGCGAAGGTGCTG 

GTGGAGGACACCAAGGTCCTGGTGCAAAACGCAGCTGGGAGCCAGGAGAA 

GTTGGCGCAGGCTGCCCAGTCCTCCGTGGCGACCATCACCCGCCTCGCTGA 

TGTGGTCAAGCTGGGTGCAGCCAGCCTGGGAGCTGAGGACCCTGAGACCC 

AGGTGGTACTAATCAACGCAGTGAAAGATGTAGCCAAAGCCCTGGGAGACC 

TCATCAGTGCAACGAAGGCTGCAGCTGGCAAAGTTGGAGATGACCCTGCTG 

TGTGGCAGCTAAAGAACTCTGCCAAGGTGATGGTGACCAATGTGACATCATT 

GCTTAAGACAGTAAAAGCCGTGGAAGATGAGGCCACCAAAGGCACTCGGGC 

CCTGGAGGCAACCACAGAACACATACGGCAGGAGCTGGCGGTTTTCTGTTC 

CCCAGAGCCACCTGCCAAGACCTCTACCCCAGAAGACTTCATCCGAATGAC 

CAAGGGTATCACCATGGCAACCGCCAAGGCCGTTGCTGCTGGCAATTCCTG 

TCGCCAGGAAGATGTCATTGCCACAGCCAATCTGAGCCGCCGTGCTATTGC 

AGATATGCTTCGGGCTTGCAAGGAAGCAGCTTACCACCCAGAAGTGGCCCC 

TGATGTGCGGCTTCGAGCCCTGCACTATGGCCGGGAGTGTGCCAATGGCTA 

CCTGGAACTGCTGGAC 


CAGTGATGTGCTGGACAAGGCCAGCAGCCTCATTGAGGAGGCGAAAAAGGC 

AGCTGGCCATCCAGGGGACCCTGAGAGCCAGCAGCGGCTTGCCCAGGTGG 

CTAAAGCAGTGACCCAGGCTCTGAACCGCTGTGTCAGCTGCCTACCTGGCC 

AGCGCGATGTGGATAATGCCCTGAGGGCAGTTGGAGATGCCAGCAAGCGAC 

TCCTGAGTGACTCGCTTCCTCCTAGCACTGGGACATTTCAAGAAGCTCAGAG 

CCGGTTGAATGAAGCTGCTGCTGGGCTGAATCAGGCAGCCACAGAACTGGT 

GCAGGCCTCTCGGGGAACCCCTCAGGACCTGGCTCGAGCCTCAGGCCGAT 

TTGGACAGGACTTCAGCACCTTCCTGGAAGCTGGTGTGGAGATGGCAGGCC 

AGGCTCCGAGCCAGGAGGACCGAGCCCAAGTTGTGTCCAACTTGAAGGGCA 

TCTCCATGTCTTCAAGCAAACTTCTTCTGGCTGCCAAGGCCCTGTCCACGGA 

CCCTGCTGCCCCTAACCTCAAGAGTCAGCTGGCTGCAGCTGCCAGGGCAGT 

AACTGACAGCATCAATCAGCTCATCACTATGTGCACCCAGCAGGCACCCGG 
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